Efficient Algorithms for Outlier-Robust Regression

نویسندگان

  • Adam Klivans
  • Pravesh K. Kothari
  • Raghu Meka
چکیده

We give the first polynomial-time algorithm for performing linear or polynomial regression resilient to adversarial corruptions in both examples and labels. Given a sufficiently large (polynomial-size) training set drawn i.i.d. from distribution D and subsequently corrupted on some fraction of points, our algorithm outputs a linear function whose squared error is close to the squared error of the best-fitting linear function with respect to D, assuming that the marginal distribution of D over the input space is certifiably hypercontractive. This natural property is satisfied by many well-studied distributions such as Gaussian, strongly log-concave distributions and, uniform distribution on the hypercube among others. We also give a simple statistical lower bound showing that some distributional assumption is necessary to succeed in this setting. These results are the first of their kind andwere not known to be even information-theoretically possible prior to our work. Our approach is based on the sum-of-squares (SoS) method and is inspired by the recent applications of the method for parameter recovery problems in unsupervised learning. Our algorithm can be seen as a natural convex relaxation of the following conceptually simple nonconvex optimization problem: find a linear function and a large subset of the input corrupted sample such that the least squares loss of the function over the subset is minimized over all possible large subsets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

Simultaneous robust estimation of multi-response surfaces in the presence of outliers

A robust approach should be considered when estimating regression coefficients in multi-response problems. Many models are derived from the least squares method. Because the presence of outlier data is unavoidable in most real cases and because the least squares method is sensitive to these types of points, robust regression approaches appear to be a more reliable and suitable method for addres...

متن کامل

Prediction of chronological age based on Demirjian dental age using robust ridge regression method

Introduction: Estimation of age has an important role in legal medicine, endocrine diseases and clinical dentistry. Correspondingly, evaluation of dental development stages is more valuable than tooth erosion. In this research, the modeling of calendar age has been done using new and rich statistical methods. Considerably, it can be considering as a practicable method in medical science that is...

متن کامل

Benchmark testing of algorithms for very robust regression: FS, LMS and LTS

The methods of very robust regression resist up to 50% of outliers. The algorithms for very robust regression rely on selecting numerous subsamples of the data. We describe new algorithms for LMS and LTS estimators that have increased efficiency due to improved combinatorial sampling. These and other publicly available algorithms are compared for outlier detection. An algorithm using the forwar...

متن کامل

SEAR - Scalable, Efficient, Accurate, Robust kNN-based Regression

Regression algorithms are used for prediction (including forecasting of time-series data), inference, hypothesis testing, and modeling of causal relationships. Statistical approaches although popular, are not generic in that they require the user to make an intelligent guess about the form of the regression equation. In this paper we present a new regression algorithm SEAR – Scalable, Efficient...

متن کامل

Fast algorithms for the minimum volume estimator

The MVE estimator is an important tool in robust regression and outlier detection in statistics. We develop fast and efficient algorithms for the MVE estimator problem and discuss how they can be implemented efficiently. The novelty of our approach stems from the recent developments in the first-order algorithms for solving the related Minimum Volume Enclosing Ellipsoid problem. Comparative com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018